Prediction of novel biomarkers for gastric intestinal metaplasia and gastric adenocarcinoma using bioinformatics analysis

Background & aim The histologic and molecular changes from intestinal metaplasia (IM) to gastric cancer (GC) have not been fully characterized. The present study sought to identify potential alterations in signaling pathways in IM and GC to predict disease progression; these alterations can be considered therapeutic targets. Materials & methods Seven gene expression profiles were selected from the GEO database. Discriminate differentially expressed genes (DEGs) were analyzed by EnrichR. The STRING database, Cytoscape, Gene Expression Profiling Interactive Analysis (GEPIA), cBioPortal, NetworkAnalyst, MirWalk database, OncomiR, and bipartite miRNA‒mRNA correlation network was used for downstream analyses of selected module genes. Results Analyses revealed that extracellular matrix-receptor interactions (ITGB1, COL1A1, COL1A2, COL4A1, FN1, COL6A3, and THBS2) in GC and PPAR signaling pathway interactions (FABP1, APOC3, APOA1, HMGCS2, and PPARA and PCK1) in IM may play key roles in both the carcinogenesis and progression of underlying GC from intestinal metaplasia. IM enrichment indicated that this is closely related to digestion and absorption. The TF-hub gene regulatory network revealed that AR, TCF4, SALL4, and ESR1 were more important for hub gene expression. It was revealed that the development and prediction of GC may be affected by hsa-miR-29. It was found that PTGR1, C1orf115, CRYL1, ALDOB, and SULT1B1 were downregulated in GC and upregulated in IM. Therefore, they might have tumor suppressor activity in GC progression. Conclusion New potential biomarkers and pathways involved in GC and IM were identified that are important for the transformation of GC from IM to adenocarcinoma and can be therapeutic targets for GC.


Introduction
Given that gastric cancer (GC) is the fifth most prevalent cancer (5.7 %) and the second most common cause of cancer-related death (8.2 %), it is estimated that nearly ninety percent of GC cases are adenocarcinomas (AC) [1].Patients with primary GC undergo endoscopy or are treated with gastrectomy with lymphadenectomy, and systemic cancer therapy is regarded as a standard treatment for patients suffering from nonremovable or recurrent GC [2].Moreover, compared with conventional diagnostic tools, advanced endoscopic methods are considered more reliable tools for detecting GC.However, this technique is limited due to its invasiveness and cost concerns [3].Serum levels of tumor markers, such as carcinoembryonic antigen (CEA), CA-125, and CA-19-9 are also commonly used for the management of GC patients.However, these methods are not sufficient to detect the disease and determine the prognosis of patients with GC [4].It is therefore necessary to develop novel diagnostic techniques.Identifying new biomarkers is helpful for designing molecular methods for the early diagnosis and monitoring of patients.Recently, many studies have been conducted in this field both at the bioinformatics level and at the genomic level, and many biomarkers have been identified.However, many of the identified biomarkers have not been proven experimentally, so conducting comprehensive studies in this field is necessary.First, the path of stomach cancer formation should be reviewed.IM is considered a precancerous condition of gastric adenocarcinoma (GAC) related to an increased risk of developing GC.IM is considered a trans-differentiation process progressing from the gastric epithelium to an intestinal type, both of which are mostly induced by H. pylori infection and the expression of the homeobox CDX1 and CDX2 genes.It is a protective reaction to inflammation, but IM also results in an increased risk of neoplastic transformation [5].Patients suffering from IM are at a greater risk of GC, and the annual incidence of GC is 0.13%-0.25 % for these patients [6].Previous studies have suggested that based on histology, IM can be categorized into two subtypes: ''low-risk'' complete (CIM, type I, small intestine) and ''high-risk'' (IM, types II and III, colonic) [7].According to epidemiological studies, the progression rate of incomplete-type IM to GC is greater than that of complete-type IM to GC [8].However, the associations between histologic and molecular changes from IM to GC are still controversial, and the genes and pathways involved in this progression are not known.Therefore, it is important to identify genes and molecular processes involved in this transition because they could reveal hub genes involved in tumor progression and potential novel biomarkers as well as therapeutic targets.
Therefore, the current research aimed to identify important genes and signaling pathways involved in the development of GC and determine important molecular markers in this pathway by using bioinformatics and designing a comprehensive study.

Data collection
There are many datasets in the Gene Expression Omnibus (GEO) database, and only some of these databases can be used for identifying hub genes involved in tumor progression from IM to GC.Therefore, five selection criteria were employed.
1.All the samples should be for Homo sapiens.2. Data can be analyzed with GEO2R.3. The number of samples should be more than six.4. In each dataset, normal subjects and patient groups were compared.5.In the IM group, patients with GIM-GC, patients with GIM without progression to GC (GIM-NoGC), and healthy controls were included.
A total of six gene expression profiles (GSE54129, GSE79973, GSE103236, GSE33651, GSE19826, and GSE118916), including information on GC and one gene expression profile (GSE78523) on intestinal metaplasia, were used to identify the effective molecular targets and pathways and GSE93415 on platform GPL19071, related to the miRNA expression level of 20 GC samples and 20 healthy samples, which were selected from the NCBI-GEO database (https://www.ncbi.nlm.nih.gov/geo/)[9].GSE54129, GSE19826, and GSE79973 are based on the GPL570 platform, and GSE103236, GSE118916, GSE33651, and GSE78523 are based on the GPL4133, GPL15207, GPL2895, and GPL18990 platforms, respectively.Using other datasets for the validation of key genes in this study, the GSE13911, GSE191275, GSE65801, and GSE174237 datasets were used from the GPL570, GPL20301, GPL14550, and GPL16791 platforms, respectively.

Data processing of DEGs
The differentially expressed genes (DEGs) between GC and normal samples and between IM and normal samples were detected by using GEO2R online analysis tools (https://www.ncbi.nlm.nih.gov/geo/geo2r/) with adjusted P-values<0.01 and |log2FC|>1 as the cutoff criteria and by dividing the DEGs into two categories with upregulated and downregulated gene expression scores, and the statistical significance of these DEGs was determined.Common data between gene expression profiles were defined by web-based Venn diagram software (http://bioinformatics.psb.ugent.be/cgi-bin/liste/Venn/calculate_venn.htpl).Normalization and data processing were performed with the GEO2R online analysis tool, which applies quantile normalization to the expression data.

Enrichment analysis
Both GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) were employed to identify possible signaling pathways involved in analyzing BP, MF, and CC, which represent biological process, molecular function, and cellular component, respectively.Furthermore, functional analysis of the selected genes was performed using the Enrichr web server [10][11][12][13][14] and the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (https://david-d.ncifcrf.gov).Additionally, an adjusted P-value<0.05was regarded as the cutoff criterion [15,16].The functional annotation of the genes involved in the modules was performed by using Enrichr.All the hub genes related to GC and IM were reanalyzed by KEGG pathway enrichment.

Protein-protein interaction (PPI) network and module analysis
The Search Tool for the Retrieval of Interacting Genes (STRING) (https://string-db.org/)was used to assess the protein-protein interaction (PPI) network with an adjusted P-value>0.4,which was set as the cutoff criterion [17,18].The cluster analysis between DEGs was performed by using Cytoscape version 3.7.2[19].The Molecular Complex Detection (MCODE) plugin of Cytoscape was M.R. Eskandarion et al. utilized to identify the modules involved in the PPI networks with a degree cutoff = 2, node score cutoff = 0.2, maximum depth = 100, and k-score = 2.The CytoHubba plugin of the Cytoscape program was used to identify the hub genes and evaluate the degree, maximum neighborhood component (MNC), maximal clique centrality (MCC), and edge percolated component (EPC) of the resulting network [20].

Survival analysis of hub genes
The overall survival of the core genes was evaluated by the Kaplan-Meier plotter [21,22].The expression levels associated with the core genes were determined by Gene Expression Profiling Interactive Analysis (GEPIA) [23].GEPIA, as a web-based tool, can achieve characteristic functionalities based on the TCGA and GTEx databases.The hazard ratio (HR) with 95 % confidence interval (95 % CI) and the P-value for the log-rank test were calculated and plotted.

Mutation analysis of hub genes
C-BioPortal (http://cbioportal.org)[24,25] was used for mutation analysis of the hub genes involved in GC and IM.C-BioPortal is an online tool for obtaining data on GC from whole-genome sequencing of 147 GC tumors and matched normal tissues.Courtesy of OncoSG [26], whole-genome sequencing of 100 GC tumor-normal pairs, was performed at the University of Hong Kong and Pfizer [27].Whole-genome sequencing was performed on 478 samples from TCGA Stomach Adenocarcinoma.The source data used were from GDAC Firehose, previously known as TCGA Provisional [28].Whole-exome sequencing of 30 diffuse-type gastric adenocarcinoma samples was performed (with matched normal samples) from the University of Tokyo [29], and exome sequencing of 22 GC samples was carried out with matched normal samples [30].

Gene regulatory network analysis
NetworkAnalyst (https://www.networkanalyst.ca/)[31,32] was used to construct a regulatory network of the hub genes and TFs (transcription factors), after which the TFs with adjusted P-value<0.05 in the ChEA were visualized via Cytoscape.

Potential miRNA-mRNA interactions
Differentially expressed miRNAs (DEmiRs) between GC samples and normal tissues in GSE93415 were identified using GEO2R online analysis tools with P-value<0.05,adjusted P-value<0.01,and |log2FC|>1 as the cutoff criteria.The miRWalk database was then used to identify the target genes of the DEmiRs.The common genes between the target genes and the chosen module genes were identified using a Venn diagram.Finally, the Cytoscape program was used to construct and assess a bipartite miRNA-mRNA correlation network [33].OncomiR (http://www.oncomir.org/)[34] was used to explore microRNA expression in normal tissues compared with cancer tissues of different stages and grades.

Identification of DEGs
As shown in Fig. 1, the flow diagram of this study demonstrates that a total of six gene expression datasets, GSE54129, GSE79973, GSE103236, GSE33651, GSE19826, and GSE118916, were selected from the GEO-NCBI for GC, and one dataset, GSE78523, was selected for intestinal metaplasia.The GSE93415 dataset was related to the miRNA expression level, and the GSE13911, GSE191275, GSE65801, and GSE174237 datasets were selected for validation of key genes (Table 1).Adjusted P-values<0.01 and |log2FC|>1 was considered for all analyses.A total of 7795 DEGs were detected for gene expression, including 54 upregulated genes and 28

Table 1
The details of the GEO datasets.downregulated genes for GSE103236, 851 upregulated genes and 822 downregulated genes for GSE118916, 86 upregulated genes and 917 downregulated genes for GSE19826, 76 upregulated genes and 622 downregulated genes for GSE33651, 1859 upregulated genes and 2078 downregulated genes for GSE54129, and 145 upregulated genes and 257 downregulated genes for GSE79973.A total of 221 upregulated DEmiRs and 223 downregulated DEmiRs were identified in GSE93415.A Venn diagram displayed that 98 upregulated DEGs and 126 downregulated DEGs revealed overlapping among the three datasets (Table 2 and Supplementary Figs.1A and B).A total of 45687 (23385 upregulated and 22302 downregulated) DEGs in the GSE13911, GSE191275, GSE65801, and GSE174237 datasets were identified as validation data.The common DEGs between all datasets for GC and IM are shown in Table 3 and Supplementary Fig. 1C.

Functional enrichment analysis
Enrichr, which is a comprehensive web server for performing gene set enrichment analysis, was used to analyze functional enrichment.The results obtained from GO analysis demonstrated that the DEGs were considerably enriched in BP, CC, and MF for GC (Supplementary Table 1) and intestinal metaplasia (Supplementary Table 2).Furthermore, enriched KEGG pathways were analyzed to identify the important pathways associated with the DEGs.As indicated in Table 4, KEGG analysis demonstrated that the DEGs for GC were upregulated in the digestion and absorption of protein, interactions between the ECM and receptor, and focal adhesion.Those for

Table 2
Venn diagram results from six GEO datasets for GC and one dataset for IM.

DEGs
Gene names GC were downregulated in the secretion of gastric acid, metabolism of xenobiotics by cytochrome P450, and metabolism of glycerolipids.KEGG pathway enrichment analysis of IM revealed that DEGs for IM were upregulated in the bile secretion, digestion and absorption, mineral absorption, the signaling pathway of PPAR, and digestion and absorption protein.

PPI and modular analysis and the identification of hub genes
A total of 129 differential genes involving 129 nodes and 507 edges in GC and 411 differential genes involving 411 nodes and 1484 edges in IM were identified via the STRING 11.0 database using Cytoscape 3.8.0,and the resulting PPI network was analyzed (Supplementary Fig. 2 (A-F)).The CytoHuuba plugin in Cytoscape software was used to determine the top 20 genes based on the degree, MNC, MCC, and EPC methods.The Venn diagram generated by the three methods revealed 18 hub genes related to GC (Table 5 and Supplementary Fig. 3 (A, B)), all of which were upregulated in GC and IM.

Hub gene enrichment
All core genes related to GC and IM were reanalyzed by KEGG pathway enrichment.The results obtained from the reanalysis revealed that eight genes (ITGB1, COL1A1, COL1A2, COL4A1, FN1, COL6A3, THBS2, and THBS1) were considerably enriched in the interaction between the extracellular matrix (ECM) and receptor in GC (P < 0.05, Fig. 2A), and six genes (FABP1, APOC3, APOA1, HMGCS2, PPARA, and PCK1) were mostly enriched in the interaction of the PPAR signaling pathway in the IM (P < 0.05, Fig. 2B).Therefore, these top genes observed in GC and IM are hub genes that we need to focus on.

Table 5
The top core genes screened in GC and IM based on the degree, MNC, MCC, and EPC.

Expression analysis of core genes in GC and IM
The expression of the eight hub genes selected from the core gene analysis step was verified by using the GEPIA database.The results demonstrated that all of the genes, except for THBS1, were upregulated in the GC and IM, and the expression levels of the FABP1, PPARA, and PCK1 genes were greater than those in the normal tissues (P < 0.05, Fig. 3A-B).

Prognosis and survival rates of the core genes in GC and IM
The Kaplan-Meier plotter was used to determine the prognostic value of the top genes related to GC and IM.COL1A1 (P = 8.9e− 05), COL1A2 (P = 0.0015), COL4A1 (P = 5.5e− 07), FN1 (P = 1.1e− 05), COL6A3 (P = 0.0015), and THBS2 (P = 1.2e− 06) were significantly associated with poor survival probability in patients with GC, and THBS1 (p = 0.073) was not significantly associated with prognosis and survival rate in patients with GC.Additionally, ITGB1 (P = 0.0049) was demonstrated to be related to favorable overall survival in patients with GC (Fig. 4A).The Kaplan-Meier plotter revealed that FABP1 (P = 0.022), APOC3 (P = 0.0034), and APOA1 (P = 0.0012) were strongly related to the prognosis of IM patients, and HMGCS2 (P = 0.11), PPARA (P = 0.23), and PCK1 (P = 0.16) were not notable markers of the prognosis or survival rate of IM patients (Fig. 4B).

Mutational analysis of hub genes involved in GC and IM
The results obtained from mutational analysis of 777 samples in five studies revealed that mutations were mostly in COL1A2, COL4A1, and COL6A3 (Supplementary Fig. 4A).Mutation analysis of the hub genes involved in IM revealed that PCK1 was the most important gene associated with amplification mutations (Supplementary Fig. 4B).

Gene regulatory network analysis
TFs showing an adjusted P-value<0.05 in ChEA via NetworkAnalyst were visualized by using Cytoscape to further understand the regulatory network between TFs and hub genes (Fig. 5A,B and Table 6).

Bipartite miRNA-mRNA network analysis
To investigate the role of selected hub genes (ITGB1, COL1A1, COL1A2, COL4A1, FN1, COL6A3, THBS2, THBS1, and PCK1) in the development of IM to adenocarcinoma, possible microRNAs that might interact with these genes via the MirWalk database were predicted, and the miRTarBase filter was considered.Evaluation of common miRNAs between these genes led to the identification of eight miRNAs that are important in IM and GC (Table 7 and Fig. 6).The expression levels of microRNAs in GC tissues and normal tissues were analyzed based on the Oncomine database, revealing that hsa-miR-29b-3p, hsa-miR-29c-3p, hsa-let-7g-5p, hsa-miR-218-5p, and hsa-miR-29a-3p, where tumorigenesis was significantly associated with their expression (Table 8(.Fig. 4. The Kaplan-Meier plotter used to determine the prognostic value of eight core genes in GC (4A) and six hub genes in IM (4B) and significantly related to the survival rate (P < 0.05).Fig. 5. Transcriptional regulatory networks (TRNs) consisting of 78 edges and 66 nodes were constructed for the hub genes (Fig. 5A).Various hub genes were regulated by TFs with a degree ≥2.The gastric tissue filters used are shown in Fig. 5B.

Table 6
Transcription factors associated with the hub genes.

Table 7
The miRNAs involved in the hub genes.

Using a public dataset to validate hub genes in GC
To confirm the reproducibility of the selected hub genes and to confirm the reliability of the integrated database a validation was performed using four random microarray datasets with the accession numbers GSE13911, GSE191275, GSE65801, and GSE174237, and 45687 (23385 upregulated and 22302 downregulated) DEGs were detected.A Venn diagram of these microarray datasets revealed that 621 DEGs were common between the datasets.After enrichment analysis, the results showed that extracellular matrix-receptor interactions (P < 0.05) were considerably enriched.During this process, we analyzed the differences in the expression of all genes related to normal-tumor tissues among the selected gastric tissue types with Match TCGA normal and GTEx data and found that all the hub genes, including ITGB1, COL1A1, COL1A2, COL4A1, FN1, COL6A3, THBS2, and THBS1, were significantly upregulated in GC tissues compared to normal tissues in this validated data series (Supplementary Fig. 5).Using bioinformatics analysis, various databases (GSE54129, GSE79973, GSE103236, GSE33651, GSE19826, and GSE118916) could provide highly reproducible results for underlying GC from intestinal etaplasia to adenocarcinoma.

Discussion
The association between histological and molecular alterations from IM to GC has not been fully characterized.Further  investigations to pinpoint the potential alterations in the specific signaling pathways involved in the transition from IM to adenocarcinoma can predict both the progression and the potential for the early diagnosis of GC using high-throughput data.
In this study, six gene expression datasets for GC were employed, with 198 tumor samples and 82 nontumor samples, and one profile dataset for IM, with 14 IM samples and 15 healthy samples.Hence, a bioinformatics analysis was carried out, and a total of 7795 DEGs were identified.A Venn diagram revealed that 98 upregulated DEGs and 126 downregulated DEGs overlapped between a minimum of 4 datasets.For deep comprehensive bioinformatics, we performed functional enrichment analysis, protein-protein interaction (PPI) analysis, modular analysis, and identification of hub genes until we could predict potential therapeutic targets by examining genome-wide aberrations.We demonstrated that extracellular matrix-receptor interactions in GC and PPAR signaling pathway interactions in the IM might contribute to both the carcinogenesis and progression of underlying GC from the IM to adenocarcinoma.Therefore, ITGB1, COL1A1, COL1A2, COL4A1, FN1, COL6A3, THBS2, and THBS1 in GC and FABP1, APOC3, APOA1, HMGCS2, PPARA, and PCK1 in IM were the hub genes on which we focused.
Many analytical bioinformatics studies on GC have reported that the ECM is a key pathway of GC [35], and the role of the ECM in remodeling may have a profound impact on both the progression and prognosis of cancer [36].However, to understand the molecular changes from IM to GC, the precancerous cascade must be considered.Clinical analysis of this study strongly indicated that ECM components (ITGB1, COL1A1, COL1A2, COL4A1, FN1, COL6A3, and THBS2) in GC were significantly correlated with overexpression and poor prognosis.Additionally, COL1A2, COL4A1, and COL6A3 had the highest mutation rates.
Recently, the role of ITGB1 (integrin β1, CD29) as a prognostic biomarker correlated with immunosuppression in GC has been demonstrated [37].Importantly, previous studies suggested that ITGB1 may be an enhancer maintaining resistance to cancer chemotherapy [38].Therefore, we focused our efforts on predicting the role of ITGB1 in GC progression and the signaling pathways involved.However, the current results demonstrated that ITGB1 could play a significant role in GC progression and might be a prognostic predictor and targeted therapy for GC.
Several collagen genes, COL1A1 and COL4A1, were previously reported to be overexpressed in GC and are closely related to overall survival in patients with GC and are considered risk factors for poor prognosis [39,40].In the present study, COL1A1, COL1A2, COL4A1, and COL6A3 were found to be precisely expressed by collagen genes in human gastric lesions and could distinguish between malignant and premalignant lesions, identifying these genes as predictive biomarkers and/or therapeutic targets for GC [41].
FN1 (fibronectin 1) was one of the key genes consistently predicted by microarray and bioinformatics analyses of GC, and FN1 was also a hub gene, which is consistent with previous studies.High FN1 expression in GC cells was related to poor prognosis, and high FN1 expression in the ECM did not predict overall survival (OS) and was correlated with tumor progression [42].Similarly, high expression of FN1 may be a positive tumor biomarker contributing to invasive breast cancer, pancreatic ductal adenocarcinoma (PDAC), and renal cell carcinoma (RCC) [43][44][45].Tumor cell migration can be inhibited by blocking the FN1 signaling pathway [46].Furthermore, another study reported that high expression of FN1, observed in cancer cells, is a tumor suppressor gene [47].
Thrombospondin-2 (THBS2), a member of the matricellular calcium-binding glycoprotein family, interacts with growth factors, cell receptor types, and the extracellular matrix (ECM) and contributes to cell proliferation, adhesion, and apoptosis [48].THBS2 may also play a significant role in the detection of colon cancer [49], lung cancer [50], and GC [51].However, the hub genes reported in this study are primarily involved in interactions between the ECM and receptors, digestion and absorption of proteins, focal adhesion, and the P13K-Akt signaling pathway, which are also activated in various cancers [52].These results shed light on new therapeutic approaches for the clinical treatment of GC.
In the IM, GC is preceded by a cascade of precancerous lesions, known as the Correa cascade [36].In relation to precancerous lesions and cancer development, the interplay of PPAR signaling pathways has been shown to be reverently represented in the IM.PPAR proteins show antitumor effects and are expressed in normal mucosa with IM adjacent to cancer [53].This may represent a strategy to prevent IM from developing into cancer.
However, the present work revealed that FABP1, APOC3, APOA1, HMGCS2, PPARA, and PCK1 are important biomarkers that have received little attention in IM and may constitute a viable strategy for targeted GC therapy, providing some new insights for investigating the mechanism of GC generation.
Fatty acid binding protein (FABP1) might play an important role in fatty acid metabolism and is overexpressed in various cancers and promotes tumor angiogenesis and migration [54].The expression of FABP1 correlates with the intensity of invasion and may guide the progression of GC and predict the risk of GC peritoneal metastasis [55].Phosphoenolpyruvate carboxykinase 1 (PCK1) is known to be an enzyme that limits the rate of gluconeogenesis, but its role in both tumor metabolism and GC progression is unknown [56].However, Kaplan-Meier analysis revealed no association between PCK1 expression and prognosis.This may be because the Kaplan-Meier analysis was performed on GC patients rather than on those with IM-GC.
Enrichment analysis of the IM samples indicated that the upregulated genes were mostly involved in digestive fat synthesis and secretion and absorption, cholesterol metabolism, vitamin digestion and absorption, bile secretion, glycolysis, and gluconeogenesis.All functions could be closely related to digestion and absorption.These findings suggest that changes in gastric digestive function and eating habits may be closely related to changes from healthy tissue to GIM.
A regulatory network of hub genes and their TFs was constructed to elucidate the molecular mechanisms of GC.In the present study, TP53, NR1H3, DMRT1, EZH2, JUN, AR, CLOCK, TCF4, SALL4, NFE2L2, SOX9, TET1, TP63, SOX2, ESR1, and PPARG were involved in the expression of the hub genes, where AR, TCF4, SALL4, and ESR1 were more important in this regulatory network.However, complex interactions between hub genes and TFS significantly contributed to the development of CC.
The translation of proteins or the inhibition of target mRNA cleavage can be regulated by miRNAs [57].This study revealed that hsa-miR-29 may affect both the development and prognosis of GC by regulating hub genes.However, low expression of miR-29a in GC is associated with aggressive cancer biology and a decreased survival rate [58].The transcript levels of miRNA-29 dramatically decline M.R. Eskandarion et al. in several types of cancer [59,60].miRNA-29 acts as an integrator and integral hub of key signaling pathways, such as nuclear factor-κB signaling, cell cycle, apoptosis, and epithelial mesenchymal transition (EMT) pathways [61][62][63][64].
However, miR-29 can act via various upstream and target genes, as a commonly downregulated tumor suppressor gene, or as an oncogene in different kinds of cancer [65].It is an important regulator in different types of human cancer.MiR-29a contributes to the development of GC, and miRNA-29 appears to have a significant effect on both the development and prognosis of GC [66,67]; however, due to its flexibility, the application of miR-29 as a biomarker and the development of miR-29a-based therapies for GC and its stages require further validation.
The identification of tumor suppressor genes was one of the major achievements of this study.DEGs between GC and IM were analyzed, showing that several genes (such as PTGR1, C1orf115, CRYL1, ALDOB, and SULT1B1) were present in both GC and IM; these genes were downregulated in GC and upregulated in IM.Notably, these genes may be tumor suppressor genes due to their dual roles in GC progression.
Prostaglandin reductase 1 (PTGR1) is known as the rate-limiting enzyme that contributes to the arachidonic acid pathway and is primarily involved in the inactivation of prostaglandins and several eicosanoids, such as leukotriene B4.Although its function in GC has not been studied, research has shown that PTGR1 is involved in the progression of many cancers [68][69][70].It has also been suggested that PTGR1 may be involved in the proliferation of cancer cells and may be a potential target for treating cancer [68].The sulfotransferases (SULTs), a family of enzymes that catalyze the sulfonation of various endogenous and exogenous substrates, include membrane-bound and cytosolic SULTs [71,72].SULT1B1 is expressed at the highest levels in the intestine but is also present in moderate amounts in the liver, kidney, and white blood cells [73].However, the known effect of SULT1B1 on GC has not been identified thus far and is likely important for the surveillance role of SULT1B1 in GC progression.
Furthermore, little is known about the roles of ALDOB, CRYL1, and C1orf115 in human cancer, especially GC, so future studies on these genes are needed.Further research and experiments on COL1A2, COL4A1, and COL6A3 in GC and PCK1 in IM and hsa-miR-29 will be particularly important.Our next research project will be to confirm the correlation between these core genes and GC.
Despite its strengths, the current study has limitations that warrant consideration; these limitations are outlined as following [1]: further experiments were needed to complement the bioinformatics analysis [2]; the basic features of the tumor, including gender, age, sample size, tumor grade and stage, and main misleading outcomes, were not considered [3]; a total of seven datasets were included, but no definitive results could be achieved; we used 12 GSE we should be data extraction from GEO and differentiated to up regulated and down regulated and gave Venn diagram for other downstream analyses performed on them, therefore we missed role some genes that work together.

Conclusion
This study identified new potential biomarkers and pathways in GC and IM that are important for underlying GC progression from IM to adenocarcinoma and revealed therapeutic targets in GC.It was demonstrated that ECM interactions in GC and PPAR signaling pathway interactions in IM may play key roles in the progression of underlying GC.COL1A2, COL4A1, and COL6A3 were significantly correlated with overexpression, poor prognosis, and the highest mutation rates in GC.FABP1, APOC3, APOA1, HMGCS2, PPARA, and PCK1 are important biomarkers that have received little attention in IM.Changes in gastric digestive function and eating habits may be closely related to changes from healthy tissue to IM. AR, TCF4, SALL4, and ESR1 were more important in the regulatory network of TF hub genes related to the expression of hub genes in GC.The development and prediction of GC via the regulation of hub genes may be affected by hsa-miR-29.PTGR1, C1orf115, CRYL1, ALDOB, and SULT1B1 may be tumor suppressor genes involved in GC progression.However, the primary conclusions of the analysis require further confirmation by a series of clinical experiments.

Fig. 1 .
Fig. 1.A flow diagram of the study.GC: Gastric Cancer; IM: Intestinal Metaplasia; GEO: Gene Expression Omnibus; DEG: Differentially Expressed Gene; DAVID: Database for Annotation, Visualization, and Integrated Discovery; STRING: Search Tool for the Retrieval of Interacting Genes; KEGG: Kyoto Encyclopedia of Genes and Genomes; MCODE: Molecular Complex Detection; GEPIA: Gene Expression Profiling Interactive Analysis.

Fig. 2 .
Fig. 2. Analysis of the top genes in GC (a) and IM (b) by KEGG pathway enrichment.

Fig. 6 .
Fig. 6.Bipartite miRNA-mRNA regulatory network of important miRNAs involved in GC progression from IM to adenocarcinoma.

Table 3
All common DEGs found between GC and IM.

Table 4
KEGG in GC and IM.

Table 8
Expression levels of five microRNAs in GC tissues and normal tissues.